2018-12-11

class: inverse

Agenda

  • Introduction

  • Utility functions

  • Visualization tools

  • Seasonal analysis

  • Forecasting applications

  • Road map

Any experince with time series analysis?

forecast package?

plotly package?

Introduction

The TSstudio package provides a set of functions for time series analysis and forecasting such as:

  • Utility functions for pre-processing time series data
  • Interactive data visualization tools for descriptive analysis, based on the plotly package engine
  • Set of functions for predictive analysis and forecasting automation with the use of models from the forecast, forecastHybrid, and bsts packages

The primary goal of the package is to simplify the analysis workflow (or, minimum code - maximum results)

Package structure

Installation

Install from CRAN:

install.packages("TSstudio")

Or from Github:

devtools::install_github("RamiKrispin/TSstudio")
library(TSstudio)

Utility functions

The ts_info returns the main characteristics of the series:

data("USVSales")
ts_info(USgas)
##  The USgas series is a ts object with 1 variable and 225 observations
##  Frequency: 12 
##  Start time: 2000 1 
##  End time: 2018 9
data("Michigan_CS")
ts_info(Michigan_CS)
##  The Michigan_CS series is a xts object with 1 variable and 465 observations
##  Frequency: monthly 
##  Start time: Jan 1980 
##  End time: Sep 2018

Utility functions

The ts_to_prophet convert time series object (e.g., ts, xts, zoo) to prophet input structure:

USgas_prophet <- ts_to_prophet(USgas)

head(USgas_prophet)
##           ds      y
## 1 2000-01-01 2510.5
## 2 2000-02-01 2330.7
## 3 2000-03-01 2050.6
## 4 2000-04-01 1783.3
## 5 2000-05-01 1632.9
## 6 2000-06-01 1513.1

Utility functions

The ts_split splits a time series object to a training and testing partitions:

USgas_split <- ts_split(USgas, sample.out = 12)

train <- USgas_split$train
test <- USgas_split$test

Utility functions

The ts_split splits a time series object to a training and testing partitions:

ts_info(train)
##  The train series is a ts object with 1 variable and 213 observations
##  Frequency: 12 
##  Start time: 2000 1 
##  End time: 2017 9
ts_info(test)
##  The test series is a ts object with 1 variable and 12 observations
##  Frequency: 12 
##  Start time: 2017 10 
##  End time: 2018 9

Utility functions

Another useful utility functions:

  • xts_to_ts() for converting xts object to ts object
  • zoo_to_ts() for converting zoo object to ts object
  • ts_sum() - summation of multiple time series objects
  • ts_reshape() - transform time series object to a data frame format

Visualization tools

The ts_plot function plot time series objects, supporting multiple formats (i.e., ts, xts, zoo, data.frame, tbl):

ts_plot(USgas)

Visualization tools

It is fairly simple to customize the plot:

ts_plot(USgas, Ytitle = "Billion Cubic Feet", 
        title = "Monthly Natural Gas Consumption in the US",
        slider = TRUE, color = "green") 

Visualization tools

All the visualization outputs are plotly objects:

p <- ts_plot(USgas, Ytitle = "Billion Cubic Feet", 
        title = "Monthly Natural Gas Consumption in the US")
class(p)
## [1] "plotly"     "htmlwidget"

Visualization tools

Therefore, you can apply any of the plotly functions, and customize the object accordingly:

library(plotly)
p %>% layout(font = list(color = "white"),
       plot_bgcolor = "black", paper_bgcolor = "black")

Seasonal analysis

The package provides a set of functions for seasonal analysis, such as:

  • ts_seasonal() - provides a view of the series by its frequency units, applicable for a series with daily frequency and above (e.g., monthly, quarterly)
  • ts_heatmap() - heatmap for time series data supports time series with half hour frequency and above
  • ts_surface - A 3D view of the series, by the frequency units (e.g., the month of the year), the cycle units (e.g. the year), and the series values
  • ts_polar - polar plot of time series data, applicable for monthly or quarterly series
  • ts_quantile - quantile plots of time series data

Seasonal analysis

ts_seasonal(USgas, type = "all",
            palette_normal = "inferno")

Seasonal analysis

ts_heatmap(USgas, color = "Greens")

Seasonal analysis

ts_surface(USgas)

Analyzing series with high frequency

library(UKgrid)

UKgrid_hourly <- extract_grid(type = "xts", 
                             columns = "ND", 
                             aggregate = "hourly", 
                             start = 2015)

ts_info(UKgrid_hourly)
##  The UKgrid_hourly series is a xts object with 1 variable and 31392 observations
##  Frequency: hourly 
##  Start time: 2015-01-01 
##  End time: 2018-07-31 23:00:00

Analyzing series with high frequency

ts_plot(UKgrid_hourly)

Analyzing series with high frequency

ts_quantile(UKgrid_hourly)

Analyzing series with high frequency

ts_quantile(UKgrid_hourly, period = "weekdays", n = 2)

Analyzing series with high frequency

ts_quantile(UKgrid_hourly, period = "monthly", n = 2)

Forecasting applications

library(forecast)

USgas_split <- ts_split(USgas, sample.out = 12)
train <- USgas_split$train
test <- USgas_split$test

md <- auto.arima(train, stepwise = FALSE, approximation = FALSE)
fc <- forecast(md, h = 12)

accuracy(fc, test)
##                     ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  -1.66563  98.53365  72.04795 -0.2777977 3.497193 0.6743267
## Test set     174.92468 196.47007 174.92468  7.0405671 7.040567 1.6371928
##                     ACF1 Theil's U
## Training set  0.02073135        NA
## Test set     -0.15231904 0.5813516

Forecasting applications

test_forecast(actual = USgas, test = test, forecast.obj = fc)

Forecasting applications

md1 <- auto.arima(USgas, stepwise = FALSE, approximation = FALSE)
fc1 <- forecast(md1, h = 60)
plot_forecast(fc1)

Forecasting applications

# md2 <- ts_backtesting(ts.obj = USgas,
#                       h = 60,
#                       window_size = 12,
#                       periods = 6)

md2$leaderboard
##    Model_Name  avgMAPE    sdMAPE  avgRMSE   sdRMSE
## 1  auto.arima 6.148333 0.9886843 184.1150 15.82619
## 2      hybrid 6.713333 0.9228579 196.2300 21.57911
## 3       tbats 7.468333 0.7641313 208.5633 17.60290
## 4        bsts 7.551667 1.9287967 214.9950 44.75606
## 5 HoltWinters 7.736667 0.6686903 218.5017 19.05705
## 6         ets 7.778333 1.7040824 202.5533 45.95689
## 7      nnetar 8.033333 0.6963524 275.0717 11.80927

Forecasting applications

md2$summary_plot

Forecasting applications

# md3 <- ts_backtesting(ts.obj = USgas,
#                       periods = 6, 
#                       error = "RMSE",
#                       window_size = h1,
#                       h = h2,
#                       a.arg = list(stepwise = FALSE, 
#                                    approximation = FALSE,
#                                    D = 1),
#                       e.arg = list(opt.crit = "mse"),
#                       n.arg = list(P = 2, 
#                                    p =1,
#                                    repeats = 100),
#                       h.arg = list(errorMethod = "RMSE",
#                                    verbos = FALSE))

Forecasting applications

md3$leaderboard
##    Model_Name  avgMAPE    sdMAPE  avgRMSE    sdRMSE
## 1  auto.arima 6.305000 0.5681813 187.9250  7.128455
## 2      hybrid 6.848333 0.6713990 197.4033 20.243566
## 3         ets 7.778333 1.7040824 202.5533 45.956891
## 4       tbats 7.468333 0.7641313 208.5633 17.602901
## 5      nnetar 6.716667 0.2035354 212.0533  2.190650
## 6        bsts 7.393333 2.0701948 212.6817 48.537247
## 7 HoltWinters 7.736667 0.6686903 218.5017 19.057048

Forecasting applications

md3$summary_plot

Forecasting applications

check_res(md3$Models_Final$auto.arima)

Road map

ts_sim(model = md3$Models_Final$auto.arima, h = 60, n = 100)